Building an efficient curation workflow for the Arabidopsis literature corpus
نویسندگان
چکیده
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org
منابع مشابه
Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR
WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, w...
متن کاملStudy of the foundation, models and issues of research data curation and management in scientific and academic environments
Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...
متن کاملConstruction of biological networks from unstructured information based on a semi-automated curation workflow
Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to e...
متن کاملUsing ODIN for a PharmGKB revalidation experiment
The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usa...
متن کاملAn automated framework for QSAR model building
BACKGROUND In-silico quantitative structure-activity relationship (QSAR) models based tools are widely used to screen huge databases of compounds in order to determine the biological properties of chemical molecules based on their chemical structure. With the passage of time, the exponentially growing amount of synthesized and known chemicals data demands computationally efficient automated QSA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012